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Abstract 

Background: With the advancement of genotyping technologies, whole genome and high-density SNP markers 
have been widely used for genotyping of mapping populations and for characterization of germplasm lines in 
many crops. Before conducting SNP data analysis, it is necessary to check the individuals to ensure the integrity of 
lines for further data analysis. 

Results: We have developed an R package to conduct a parent-offspring test of individuals which are genotyped 
with a fixed set of SNP markers for further genetic studies. The program uses monomorphic SNP loci between 
parents and their progeny genotypes to calculate the similarity between each offspring and their parents. Based on 
the similarity of parents and individual offspring, the users can determine the threshold level for the individuals to 
be included for further data analysis. We used an F 5 -derived soybean population of '560 1T' x PI 157440 that was 
genotyped with 1,536 SNPs to illustrate the procedure and its application. 

Conclusions: The R package 'ParentOffspring' coupled with the available SNP genotyping platforms could be used 
to detect the possible variants in a specific cross, as well as the potential errors in sample handling and genotyping 
processes. It can be used in any crop which is genotyped with a fixed set of SNP markers. 
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Background 

Single nucleotide polymorphism (SNP) genotyping plat- 
forms including Invader® assay, single base extension 
(SBE), oligonucleotide ligation assay (OLA) SNPlex™ sys- 
tem, and the Illumina GoldenGate™ and Infinium™ assays 
[1] have been developed and widely used to genotype 
crop plants with a fixed set of SNP markers [2-8]. The 
SNP data generated from these platforms have been ex- 
tensively used for genetic and genomic studies including 
QTL mapping, germplasm characterization, association 
mapping, and molecular breeding. With a large scale of 
SNP data available for data analysis, integrity of individ- 
uals included in an experiment is very important to en- 
sure the accuracy of the results. For example in a typical 
QTL mapping population or progeny derived from a 
cross, variants could result from outcrossing, seed mix- 
ing, sampling errors, and many other ways during the 
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population development and sample handling processes. 
When variants are present, they could distort the experi- 
mental results. Although morphological and physiological 
characteristics can be used to distinguish the variants in a 
population or seed lot, they are very limited due to avail- 
ability of phenotypes and accuracy of determination. With 
the availability of a high density of SNP markers assayed 
on these advanced genotyping platforms, monomorphic 
marker loci could be used to detect outcrossing or seed 
mixture of individuals and possible genotyping errors dur- 
ing sampling and genotyping processes by comparing the 
SNP alleles of these progeny with their parents' alleles. In 
soybean, the Universal Soy Linkage Panel (USLP 1.0) con- 
sisting of 1,536 SNPs on the Illumina GoldenGate® Plat- 
form has been developed and used for quantitative trait 
locus (QTL) discovery [6]. Recently, the iSelect Infinium 
assay which contained over 50,000 SNPs from soybean 
genome have been also developed [9]. Yan et al. (2009) ge- 
notyped 632 inbred maize lines with 1,536 SNPs. Over 
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Table 1 An example of assigning numerical (0, 1, 2) 
scores based on the genotypes of progeny and their 
parents (x represents an SNP allele from both parent 1 
and parent 2 and y indicates that the SNP allele is not 
from parent 1 or parent 2) 



Parent 1 


Parent 2 


Progeny 


Score 


XX 


XX 


XX 


2 


XX 


XX 


xy 


1 


XX 


XX 


yy 


0 



200 barley germplasm lines, including European and U.S. 
breeding materials, were genotyped with 3,072 SNPs on 
Illumina GoldenGate assays that are available to the barley 
community [2]. Similarly, 1,536 SNPs on Illumina Golden- 
Gate assays were developed to fingerprint 478 spring and 



winter wheat lines [8]. A custom-designed Affymetrix 
array consisting of 44,100 SNPs was used in rice to study 
the genetic architecture of aluminum tolerance in a bi- 
parental population and a set of 383 diverse rice acces- 
sions [10,11]. 

With such advanced genotyping technologies, massive 
amount of SNP data including both polymorphic and 
monomorphic loci have been generated. Typically, the 
monomorphic loci are excluded from the data set before 
further analyses. For example, genotyping a bi-parental 
population in soybean using an USLP 1.0 panel of 1,536 
SNP loci will result in around 1,000 monomorphic loci, 
which could be used to test if all progeny are truly de- 
rived from same parents. Here we developed an R pack- 
age to calculate the similarity between each offspring 
and its parents using monomorphic SNP loci. This pro- 
gram can be used by researchers to quality control the 
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Figure 1 Distribution of monomorphic SNP markers on 20 soybean chromosomes generated from 5601 T x PI 157440 population. 
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Figure 2 Distribution of parent-offspring similarity values. 



progeny and determine which offspring needs to be ex- 
cluded from further analysis. 

Implementation 

To calculate the similarity of each offspring to its par- 
ents utilizing the parental monomorphic loci at each 
monomorphic SNP locus, allele calls in each offspring 
are assigned with numbers 0, 1, and 2 by comparing the 
parental genotype. A score of "2" is assigned when an 
offspring has same alleles to the both parents; a score of 
"1" indicates that the offspring possesses one parental 
allele and one non-parental allele; and a score of "0" 



indicates that the offspring possesses different alleles 
than their parents. An example of the assignments is 
presented in Table 1. 

The similarity between an offspring and its parents is 
calculated based on the all monomorphic loci as follow: 

S = (2a + b) / (2a+2b+2c) 

Where S = the similarity between an offspring and its 
parents, a = the number of markers with a score of 2, 
and b = the number of markers with a score of 1 and c = 
the number of markers with a score of 0. The R package 
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Figure 3 QTL analysis using single factor analysis for bacterial pustule resistance on seven soybean chromosomes at similarity 
threshold of 0%, 90% 95%, and 99%. 
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can be found in following link (http://cran.r-project.org/ 
web/packages/ParentOffspring/index.html). The output 
files of this program include both monomorphic and 
polymorphic SNP marker data set for further analyses. 

Theoretically, an offspring should have 100% similarity 
to its parents. However, due to the possible genotyping er- 
rors, outcrossing, seed mixture, and other unknown rea- 
sons, similarity of an offspring to its parents could be less 
than 100%. Reduced similarity will increase the possibility 
of that offspring being a variant of the specific cross. 

Result and discussion 

An F 5 -derived recombinant inbred line soybean popula- 
tion was used as an example to illustrate the procedure 
and application. The population was developed from a 
cross of 5601T x PI 157440 at the University of Georgia 
Plant Sciences Farm, Athens, Georgia. 5601 T is a cultivar 
developed and released by University of Tennessee [12] 
and PI 157440 was selected as a parent based on its high 
canopy photosynthetic capacity during the reproductive 
period [13]. The F 2 plants were selfed to produce F 2 
plants. Seeds from individual F 2 plants were advanced to 
the F 5 generation using a single-seed descent method [14]. 
The F 5 plants were grown at the University of Georgia 
Plant Sciences Farm and at maturity, individual F 5 plants 
were harvested to create 150 F 5:6 lines. The soybean USLP 
1.0 panel of 1,536 SNP markers on the GoldenGate® 
platform [6] were used to fingerprint these 150 RILs 
and their parents. The SNP allele calls were performed 
on the Illumina BeadStation 500G (Illumina, San Diego, 
CA). The population was evaluated to the bacterial pus- 
tule disease reaction (Xanthomonas campestris pv. Gly- 
cines) under field conditions of UGA Plant Sciences 
Farm during 2010 in a randomized complete block de- 
sign with two replications. Based on leaf severity symp- 
toms, the lines were visually rated for bacterial pustule 
reaction on a plot basis with a scale of 1 to 5, where 
plots with no symptoms were rated as 1 (Resistant) 
and plots with severe symptoms as 5 (Susceptible). As- 
sociations of the SNP markers with the bacterial pustule 
rates were tested using single-factor analysis in SAS 9.3 
[15]. 

Of 1,536 SNP markers in 5601T x PI 157440 popula- 
tion, 542 SNPs (37%) were polymorphic and 938 SNPs 
monomorphic which accounted for 63% of the total SNP 
markers. These monomorphic SNPs were distributed 
eventually on all chromosomes (Figure 1). 

When the similarity threshold was set as 90%, five 
genotypes were declared as variants; at similarity ratio < 
95%, the number of variants was 18, and at more strin- 
gent similarity ratio of < 99%, the number of variants 
reached 70 (Figure 2). The determination of similarity 
threshold level depends on many factors such as 
objectives, line types, and genotyping platforms. It is 



expected that excluding the variants from a population will 
lead to eliminate the outliners for further genetic analysis. 
To demonstrate the method, the polymorphic SNP 
markers data from the 150 RILs of 5601T x PI 157440 
population were used to detect QTL associated with re- 
sistance to bacterial pustule disease using a single factor 
analysis approach. Single factor analysis identified a 
major QTL accounted for 32.7% of the phenotypic vari- 
ation on chromosome 17 that is in agreement with the 
reports by Narvel et al. [16]. Based on the similarity 
threshold levels of 90, 95, and 99% (Figure 3), 5, 18, and 
70 lines, respectively, were excluded from the dataset 
for analysis. The R 2 for the major QTL on chromosome 
17 becomes 34.8, 37.5 and 45.6%, respectively (Figure 3). 
This indicated that quality control of the offspring using 
the monomorphic SNP markers could help improving 
the genetic analyses and thus accuracy of the result. 
Based on our data, we suggest to use the similarity ratio 
of 90-95% as a threshold in a study. 

Conclusions 

The R package 'ParentOffspring' was developed to con- 
duct a parent-offspring test of individuals which are geno- 
typed with a fixed set of SNP markers for further genetic 
studies. The application of the R package coupled with the 
available SNP genotyping platforms could be used to de- 
tect the possible variants in a specific cross, as well as the 
potential errors in sample handling and genotyping pro- 
cesses. It can be used in any crop which is genotyped with 
a fixed set of SNP markers. 

Availability and requirements 

Project name: ParentOffspring project 

Project home page: http://cran.r-project.org/web/pack- 

ages/ParentOffspring/ 

Operating system(s): Windows, Mac OS, Linux 
Programming language: R 
Other requirements: R version 2.15.1 or higher 
License: GPL-2 | GPL-3 

Any restrictions to use by non-academics: Non. 
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